A Framework for Measuring the Impact of Web Spam

نویسندگان

  • Timothy Jones
  • David Hawking
  • Ramesh Sankaranarayana
چکیده

Web spam potentially causes three deleterious effects: unnecessary work for crawlers and search engines; diversion of traffic away from legitimate businesses; and annoyance to search engine users through poorer results. Past research on web spam has focused on spamming techniques, spam suppression techniques, and methods for classifying web content as spam or non-spam. Here we focus on the deterioration of search result quality caused by the presence of spam in a countryscale web. We present a framework for measuring the degradation in quality of search results caused by the presence of web spam. We index the 80 million page UK2006 web spam collection on one machine. We trial the proposed framework in an experiment with the UK2006 collection and demonstrate that simple removal of spam pages from result sets can increase result quality. We conclude that the framework is a reasonable vehicle for research in this area and outline changes necessary for planned future experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring the Impact of Non-economic Exogenous Factors on Industrial Energy Demand in Iran

This paper tries to demonstrate the importance of non-economic exogenous factors (Underlying Energy Demand Trend) when we estimate the industrial energy demand for Iran. The Structural Time Series Model (STSM) approach is used to model these unobservable factors. The Kalman filter and Maximum Likelihood methods allow us to estimate the value of the UEDT. This approach enables us to obtain more ...

متن کامل

Research Note - The Cost Impact of Spam Filters: Measuring the Effect of Information System Technologies in Organizations

The Cost Impact of Spam Filters: Measuring the Effect of Information System Technologies in Organizations More than 70% of global e-mail traffic consists of unsolicited and commercial direct marketing, also known as spam. Dealing with spam incurs high costs for organizations, prompting efforts to try to reduce spam-related costs by installing spam filters. Using modern econometric methods to re...

متن کامل

A framework for Measuring the Dynamics Connections of Volatility in Oil and Financial Markets

Investigating connections between financial and oil markets is important for investors and policy makers. This knowledge allows for appropriate decision making. In this paper, we measure the dynamic connections of selected stock markets in the Middle East with oil markets, gold, dollar index and euro-dollar and pound-dollar exchange rates during the period February 2007 to August 2019 in networ...

متن کامل

Information Assurance: Detection of Web Spam Attacks in Social Media

As online social media applications continue to gain its popularity, concerns about the rapid proliferation of Web spam has grown in recent years. These applications allow spammers to submit links anonymously, diverting unsuspected users to spam Web sites. This paper presents a novel co-classification framework to simultaneously detect Web spam and spammers in social media Web sites based on th...

متن کامل

Lived experience Consumers in online stores based on the Stimulator-Organism-Response Framework (SOR)

In this study, based on the stimulus-organism-response framework (SOR), to develop a comprehensive framework of consumer experience in the field of online retailers, examining the impact of online store environment elements (web quality and brand Web site) as forecasting for emotional responses and cognitive (trust and perceived risk) and behavioral responses of consumers (want to buy) are disc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007